You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

Posts with tag Digital Humanities Now Editors' Choice


← Back to all posts
Feb 25 2013

Now back to some texts for a bit. Last spring, I posted a few times about the possibilities for reading genders in large collections of books. I didnt follow up because I have some concerns about just what to do with this sort of pronoun data. But after talking about it to Ryan Cordells class at Northeastern last week, I wanted to think a little bit more about the representation of male and female subjects in late-19th century texts. Further spurs were Matt Jockers recently posted the pronoun usage in his corpus of novels; Jeana Jorgensen pointed to recent research by Kathleen Ragan that suggests that editorial and teller effects have a massive effect on the gender of protagonists in folk tales. Bookworm gives a great platform for looking at this sort of question.

Jan 09 2013

Following up on my previous topic modeling post, I want to talk about one thing humanists actually do with topic models once they build them, most of the time: chart the topics over time. Since I think that, although Topic Modeling can be very useful, theres too little skepticism about the technique, Im venturing to provide it (even with, Im sure, a gross misunderstanding or two). More generally, the sort of mistakes temporal changes cause should call into question the complacency with which humanists tend to  topics in topic modeling as stable abstractions, and argue for a much greater attention to the granular words that make up a topic model.

Nov 14 2012

Note: this post is part 5 of my series on whaling logs and digital history. For the full overview, click here.

Nov 02 2012

Note: this post is part 4, section 2 of my series on whaling logs and digital history. For the full overview, click here.

May 08 2012

Its pretty obvious that one of the many problems in studying history by relying on the print record is that writers of books are disproportionately male.

Feb 02 2012

Though I usually work with the Bookworm database of Open Library texts, Ive been playing a bit more with the Google Ngram data sets lately, which have substantial advantages in size, quality, and time period. Largely I use it to check or search for patterns I can then analyze in detail with text-length data; but theres also a lot more that could be coming out of the Ngrams set than what Ive seen in the last year.

Jan 05 2012

[This is not what Ill be saying at the AHA on Sunday morning, since Im participating in a panel discussion with Stefan Sinclair, Tim Sherrat, and Fred Gibbs, chaired by Bill Turkel. Do come! But if I were to toss something off today to show how text mining can contribute to historical questions and what sort of issues we can answer, now, using simple tools and big data, this might be the story Id start with to show how much data we have, and how little things can have different meanings at big scales]

Dec 16 2011

When data exploration produces Christmas-themed charts, thats a sign its time to post again. So heres a chart and a problem.

Nov 19 2011

Ted Underwood has been talking up the advantages of the Mann-Whitney test over Dunnings Log-likelihood, which is currently more widely used. Im having trouble getting M-W running on large numbers of texts as quickly as Id like, but Id say that his basic contentionthat Dunning log-likelihood is frequently not the best methodis definitely true, and theres a lot to like about rank-ordering tests.

Oct 06 2011

Historians often hope that digitized texts will enable better, faster comparisons of groups of texts. Now that at least the 1grams on Bookworm are running pretty smoothly, I want to start to lay the groundwork for using corpus comparisons to look at words in a big digital library. For the algorithmically minded: this post should act as a somewhat idiosyncratic approach to Dunnings Log-likelihood statistic. For the hermeneutically minded: this post should explain why you might need _any_ log-likelihood statistic.